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Introduction 


He intensity of today's time shows that our life cannot be imagined without the Internet or social networks. 
Most people's activities are related to being aware of the daily news happening in the world through social 
networks and transmitting them to each other. Formally, they are members of a social network and their 
interactions can be represented in the form of a graph of connections—their correspondence, common 
interests, and mutual friends. In this case, they can be analyzed using mathematical methods. Social 
networks change rapidly, so random graphs are one of the tools to study them. Mathematical analysis of 
graphs of social networks is carried out using different measures, using the center of the vertices and the 
edges of the graph. 


Real social networks are characterized by considerable disorder depending on the organization of the 
networks[1,2]. This is because the connectivity graph structure contains groups of vertices characterized by a 
greater distribution of intra-group connections than vertices in other groups. Given these characteristics, it is 
possible to distinguish between communities in the network. Identifying communities in a social network 
helps to identify abnormal behavior of its members. In general, there are two approaches to identifying 
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communities in a network: when communities may not intersect and when they may overlap. If we are 
interested in the professional, family, friendly relations of the participants, the latter can happen. 


The article [4] closest to the present work, in which it was proposed to apply the maximum likelihood 
method for graph clustering. The difference in approaches is determined by the fact that in [4] the analyzed 
graph assumes. The real graph is generated randomly with given parameters for internal and external links. 


A probabilistic approach called the maximum likelihood method, widely used in mathematical statistics, can 
be used to identify communities in the network. Following the approach described in [1], we write a 
mathematical model for community detection based on the maximum likelihood method. 


Let us assume that the network is randomly generated. The number of teams is fixed. It is clear that the 
closeness of relationships within the community is higher than outside the community. We consider the 
following parameters: 1) - the probability of connection between any two vertices in the community; 2) - 
probability of connection between two vertices from different communities. By maximizing the most likely 
structure of the partitioning into teams over all possible network configurations, we obtain a partition that 
matches the real data. 


Consider a network G = (N ,E ) in which the set of vertices has N = {1 Dieestt appearances. Let the number 


of edges of the network be m= m(E ), and the connection between vertices E (i J ) i and j be 


E(o)={ 


By community § we mean a non-empty subset of network vertices, and by partition // (N ) we mean a set 


1, [f there is a connection between i and j teams 


0, If there is no connection between teams i and j 


of disjoint communities whose union is exactly N sets: 
N:1I(N)={5,,5;,....5,} here J" 8, =" 


Suppose that the real part of the network is 1T=18,,S, ssi, Let the variables n, =n(S,) and 


m, = m(S ~ denote the number of vertices and edges in the community S,,k =1,...,K , respectively. Then 


K K 
n=>n and ym, <m. 
k=l k=l 


Let us express the conditions under which the division into teams is optimal. 


Simple graph. Check out the $, <¢ /7 community. The probability of making m, connections between n, 
vertices in community S$, is equal to 


mM 


Pin (1- p,,) 2 


Each vertex i inacommunity S$, may have n—n, connections with vertices from other communities, but in 


fact it has Diy , E(é Jj) connections with vertices from other communities have 
The probability of realizing a network with a given structure is equal to 
K Nk (7% -l) »y E(i ;) if a 
_ m sm, pho jeSp oJ + n-n,-)) E(i.j) 
by =] [pm (Pa) 2D eee 1 iy 2 Roe 
k=l iS, 


Taking the logarithm of the probability function L, in (1) and simplifying it, we get 
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n, (n, -1) 
k k 
[, = log Ly, = »m™ log p,, + 5 = m, Joa — Pin ) + 
k=l k=1 
K 
- m- > m, log p,,, + sel n-n,)— mom log(1= p,,,) (2). 
k=l 


The partition /7° in which the function |, reaches its maximum over all possible partitions is called optimal. 
Note that there is still uncertainty in the choice of probabilities p, and p,,,. The function 


Li =ly ( Pixs Pout ) depends on the arguments of p,,, p,,,,- By maximizing / with respect to p;, and p 


out ” 


these values can then be used in numerical calculations. 
Statement of the problem 


Statement 1. For a fixed partition /7, the function /,, ( Dis Piz) reaches a maximum in 


K K 
2 of m- ym. 
HS Dy = — 03). 


rn 
2 2 
Yn —n i= XH, 
k=l 


We create a model in Maple to calculate the function /,, at each division: since we have calculated several 


Pin = 


divisions, the program "restart;" Let's start with the command. 
Step 1.We enter the number of vertices in the partitions (that is, n[k]s). 
n{1] := 


m[1] := 


Step 3.We enter the total number of vertices and edges in the graph (that is, y and x). 


y=; 


: ee 


HT 
= 


Step 4. We introduce the formula for calculating the function /,.Ip := sum(m[k]*In(p[i]), k 
3)+sum(((1/2)*n[k]*(n[k]-1)-m[k])*In(1-p[i]), k = 1 re 3)+(x-sum(m[k], k = 
3))*In(p[o])+((1/2)*(sum(n[k]*(y-n[k]), k = 1 .. 3))-x+sum(m[k], k = 1 .. 3))*In(1-p[o]); 


— 


Step 5. We set the function [,, to 0 by differentiating it with respect to p,, and p,,,. 
pL] := solve(diff(p, p[i]) = 0); 
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plo] := solve(diff(lp, p[o]) = 0); 


Step 6. We calculate the value of the function |, based on p,, and p,,, found in step 5. 


in out 


evalf(Ip); 


Sample: Consider the simple octagonal network shown in Figure 1 


A 


E 


Drawing 1. octagonal network 


Let's calculate the value of /,, for different partitions. 


We obtain from the probability function (2) for the distribution 
IT ={A,B,C,D,E,F,G,H}. 


The total number of vertices is n =8 and the total number of edges is m=14, where n, =8 and m, =14 
since there is one group. 


|, =14log p,, + 14log(1- p,, ). 


We differentiate the function / 7, and set its derivative to 0: 


14°14 | 
Pin i Pin 
the maximum value of the function is reached at P,, = = . Its value is —19.408. 


Il = {A,B,C, D} U{E, F, G, H} 


A H 


D E 


Drawing 2. {A,B,C,D} section Drawing 3. {E,F,G,H} section 


Let's calculate the value of /,, for division. The total number of vertices is 1 =8 and the total number of 
edges is m=14, where since there are two groups, the number of vertices is n, =4 and n, =4 and the 
edges are m, =4 and m, = 4 will be. 
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|, =8log p,, + 4log(1— p,, )+ 6log p,,, +10log(1— p,,, ) 


Here we also differentiate the function ly with respect to p,, and p,,,, and set the derivative to 0: 


4 
cae =6 
Pin lp 
1 
6 10 _o 
Pout Pout 


3 
the maximum value of the function is reached at D,,, = 3 and P,,, ==. Its value is -18.2. 


Conclusions 


The value of the remaining divisions is given in the following table: 


Divisions ny, »M, l 0 Pin > Pow Ly ( Pin» Pow ) 
{A, B,C, D, E, F,G,H} n,=8 m,=14 | 14logp,,+14log(1-p,,) | p, = ~ -19.4081 
2 
{A,B.C,D} USE. F,G,H} ane, Slog p. +4log(I—Pa)+ | "3 | ig 4934 
m, =(4,4) 6log p,,, +10log (1— p,, ) ie 3 
out 8 
433 
{A,B} UIC, D, E, F}U{G, H} alae, Slog Py + 2log(I= Py)+ | U4 -17.9589 
m, =(1,4,1) 8log p,,, +12log(1— p,,, ) by. 2 
out 5 
_9 
{A,B} O{C,D,E,F,G,H} 5128) 9108 Pa + Tlog(I—Py)+ | "16 -19.1153 
n, = (1,8) 5log Pyy +7 log (l= Pou ) a 
Pow = 12 
_9 
{A,B,C} U{D, E, F,G,H} t=) Plog Py +4log(l= Py) | P* "13 -17.5718 
m, =(2,7) Slog p,, + 10log (1— p,,, ee 1 
out 3 
oa 
{A,B,C} U{D, E, F} U{G, H} ee) S108 Pu +1081 Pn}+ 7 -16.8259 
m, =(2,3,1) 8log p,,, +13log(1— p,, 8 
Pow Aa 
21 
3 
{A,B} U{C, D} U{E, F,G, H} a ca Glog Pa +1o8(1~ P)+ una -17.9589 
m, =(1,1,4) 8log p,,, +13log(1— Pp, — 2 
out 5 
6 
{A,C, F} U{B, D} U{E,G, H} ae: 6108 Ps +108 (1- Pa)+ oe -16.8259 
m, =(2,1,3) Tlog Py +14log (1— P,., 8 
Pou =a 
21 
n, =(2,2,2,2) | 4log p,, + Py =I 
{A,C}U{B, D} U{E,G} U{F, H} atin (Oloxp,2iAbet =p.) |», -5 -16.3006 
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It can be seen that partition J ={A4,BYU{C,D} UE, F \UIG,H \ gives the most probable 


community structure for this network. 
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